Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time series based workload desc order optimization through reverse segment read #7244

Merged
merged 19 commits into from
Apr 26, 2023

Conversation

gashutos
Copy link
Contributor

@gashutos gashutos commented Apr 19, 2023

Description

EDIT -> After incorporating comments/suggestions from Nick, Andriya & Andorss, we decided to limit this change only to @timestamp field indices and also we changed default segment search order for these indices as Descending (from latest to the oldest segments), and also safe guarded rare ASC queries on time series based workload not to traverse on default DESC order to avoid any regression. The perf gains still remains similar.

As described in issue OpenSearch-6814, desc order sort queries are taking more time (3x or more) compare to asc order queries on same data set (time series based).
That is because all latest documents in time series workload are in latest segments when our IndexSearcher search documents in order of oldest segments to newest segments.

In this PR, we changed the leafReaderContext (segments) order in ContextIndexSearcher in reverse if query type is descending order sort. This will by default behaviour for all type of workloads and not just for time series based workloads. However it will only benefit to workload which has ever increasing data values. And for other type of workload, it wont regress or impact since data is randomly distributed across all segments.
But as a precaution, in case any user/workload gets search latency hit, I have added index level setting to disable this optimization.

Optimization gains

We are seeing 3x to 5x gains on below configuration.

  • On http_logs workload with disabled force merge, so each shard has around 100-150 segments.
  • instance type, r6g.2xlarge.
  • units are in ms.
Sort Type hits = 5 hits = 100 hits = 1000
Desc 150 462 621
Desc (with this change) 49 112 125

Issues Resolved

OpenSearch-6814

Next action item

On same segment of time series based workload, ASC queiry is faster compare to descending order. We need to look further to that.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: gashutos <[email protected]>
@gashutos
Copy link
Contributor Author

@nknize @reta , check once you get time, if this appraoch looks good. The optimization gain is pretty good.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@reta reta added the backport 2.x Backport to 2.x branch label Apr 27, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-7244-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 4c98b3d38064b94a87f6d7a1359e623849459bac
# Push it to GitHub
git push --set-upstream origin backport/backport-7244-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-7244-to-2.x.

austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Apr 28, 2023
…gment read (opensearch-project#7244)

This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use 
the @timestamp metadata field) so that a descending LeafSorter comparator is used to 
visit segments in order of most newest to oldest. For the more infrequent case that a user 
chooses to sort query results by ASC time, this would cause a search regression so the 
ContextIndexSearcher is updated to inspect the sort order from the search request and 
reverse the comparator so segments are visited in ascending order. LeafSorter behavior 
for non-timeseries indexes is left the same. 

Signed-off-by: gashutos <[email protected]>
Signed-off-by: Chaitanya Gohel <[email protected]>
@gashutos gashutos deleted the main-timeseries branch May 5, 2023 06:48
gashutos added a commit to gashutos/OpenSearch that referenced this pull request May 8, 2023
…gment read (opensearch-project#7244)

This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use
the @timestamp metadata field) so that a descending LeafSorter comparator is used to
visit segments in order of most newest to oldest. For the more infrequent case that a user
chooses to sort query results by ASC time, this would cause a search regression so the
ContextIndexSearcher is updated to inspect the sort order from the search request and
reverse the comparator so segments are visited in ascending order. LeafSorter behavior
for non-timeseries indexes is left the same.

Signed-off-by: gashutos <[email protected]>
Signed-off-by: Chaitanya Gohel <[email protected]>
reta pushed a commit that referenced this pull request May 8, 2023
…gment read (#7244) (#7457)

This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use
the @timestamp metadata field) so that a descending LeafSorter comparator is used to
visit segments in order of most newest to oldest. For the more infrequent case that a user
chooses to sort query results by ASC time, this would cause a search regression so the
ContextIndexSearcher is updated to inspect the sort order from the search request and
reverse the comparator so segments are visited in ascending order. LeafSorter behavior
for non-timeseries indexes is left the same.

Signed-off-by: gashutos <[email protected]>
Signed-off-by: Chaitanya Gohel <[email protected]>
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 2, 2023
…verse segment read (opensearch-project#7244)"

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 2, 2023
…verse segment read (opensearch-project#7244)"

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.

Signed-off-by: Andrew Ross <[email protected]>
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 2, 2023
…verse segment read (opensearch-project#7244)"

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.

Signed-off-by: Andrew Ross <[email protected]>
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 2, 2023
…verse segment read (opensearch-project#7244)"

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.

Signed-off-by: Andrew Ross <[email protected]>
andrross added a commit that referenced this pull request Jun 2, 2023
…verse segment read (#7244)" (#7892)

This reverts commit 4c98b3d.

Reverting due to issue reported in #7878.

Signed-off-by: Andrew Ross <[email protected]>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 2, 2023
…verse segment read (#7244)" (#7892)

This reverts commit 4c98b3d.

Reverting due to issue reported in #7878.

Signed-off-by: Andrew Ross <[email protected]>
(cherry picked from commit bb26536)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross added a commit that referenced this pull request Jun 2, 2023
…verse segment read (#7244)" (#7893)

This reverts commit 4c98b3d.

Reverting due to issue reported in #7878.

Signed-off-by: Andrew Ross <[email protected]>
reta pushed a commit that referenced this pull request Jun 2, 2023
…tion through re… (#7895)

* Revert "Time series based workload desc order optimization through reverse segment read (#7244)" (#7892)

This reverts commit 4c98b3d.

Reverting due to issue reported in #7878.

Signed-off-by: Andrew Ross <[email protected]>
(cherry picked from commit bb26536)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Remove unused imports

Signed-off-by: Andrew Ross <[email protected]>

---------

Signed-off-by: Andrew Ross <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Andrew Ross <[email protected]>
gashutos added a commit to gashutos/OpenSearch that referenced this pull request Jun 8, 2023
gashutos added a commit to gashutos/OpenSearch that referenced this pull request Jun 8, 2023
andrross pushed a commit that referenced this pull request Jun 12, 2023
…gh reverse segment read (#7244)] with fixes (#7967)

* Revert "Revert "Time series based workload desc order optimization through reverse segment read (#7244)" (#7892)"

This reverts commit bb26536.

Signed-off-by: gashutos <[email protected]>

* Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field

Signed-off-by: gashutos <[email protected]>

* Modifying CHANGELOG

Signed-off-by: gashutos <[email protected]>

* Adding integ test for scroll API where sort by _doc is getting early termination

Signed-off-by: gashutos <[email protected]>

---------

Signed-off-by: gashutos <[email protected]>
gashutos added a commit to gashutos/OpenSearch that referenced this pull request Jun 13, 2023
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967)

* Revert "Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)"

This reverts commit bb26536.

Signed-off-by: gashutos <[email protected]>

* Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field

Signed-off-by: gashutos <[email protected]>

* Modifying CHANGELOG

Signed-off-by: gashutos <[email protected]>

* Adding integ test for scroll API where sort by _doc is getting early termination

Signed-off-by: gashutos <[email protected]>

---------

Signed-off-by: gashutos <[email protected]>
gbbafna pushed a commit that referenced this pull request Jun 13, 2023
…gh reverse segment read (#7244)] with fixes (#7967) (#8037)

Signed-off-by: gashutos <[email protected]>
gaiksaya pushed a commit to gaiksaya/OpenSearch that referenced this pull request Jun 26, 2023
…tion through re… (opensearch-project#7895)

* Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.

Signed-off-by: Andrew Ross <[email protected]>
(cherry picked from commit bb26536)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Remove unused imports

Signed-off-by: Andrew Ross <[email protected]>

---------

Signed-off-by: Andrew Ross <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Andrew Ross <[email protected]>
gaiksaya pushed a commit to gaiksaya/OpenSearch that referenced this pull request Jun 26, 2023
imRishN pushed a commit to imRishN/OpenSearch that referenced this pull request Jun 27, 2023
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967)

* Revert "Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)"

This reverts commit bb26536.

Signed-off-by: gashutos <[email protected]>

* Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field

Signed-off-by: gashutos <[email protected]>

* Modifying CHANGELOG

Signed-off-by: gashutos <[email protected]>

* Adding integ test for scroll API where sort by _doc is getting early termination

Signed-off-by: gashutos <[email protected]>

---------

Signed-off-by: gashutos <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…gment read (opensearch-project#7244)

This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use
the @timestamp metadata field) so that a descending LeafSorter comparator is used to
visit segments in order of most newest to oldest. For the more infrequent case that a user
chooses to sort query results by ASC time, this would cause a search regression so the
ContextIndexSearcher is updated to inspect the sort order from the search request and
reverse the comparator so segments are visited in ascending order. LeafSorter behavior
for non-timeseries indexes is left the same.

Signed-off-by: gashutos <[email protected]>
Signed-off-by: Chaitanya Gohel <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…verse segment read (opensearch-project#7244)" (opensearch-project#7892)

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.

Signed-off-by: Andrew Ross <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967)

* Revert "Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)"

This reverts commit bb26536.

Signed-off-by: gashutos <[email protected]>

* Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field

Signed-off-by: gashutos <[email protected]>

* Modifying CHANGELOG

Signed-off-by: gashutos <[email protected]>

* Adding integ test for scroll API where sort by _doc is getting early termination

Signed-off-by: gashutos <[email protected]>

---------

Signed-off-by: gashutos <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants